g
p
[
,
,
]
-generation sequencing data quality
-generation sequencing technology has been very widely used in
l/medical researches. For instance, it has been incorporated with
poson technology to generate a new method for the mutation
nalysis. The new method is called the transposon sequencing
gy. Using this technology, millions of mutants can be identified
gle experiment. Based on the gene-wise transposon statistics,
genes can be identified using the density pattern analysis
es as discussed in Chapter 2 of this book. However, most existing
es for identifying essential genes are based on replicate-free data.
mption is that a single transposon sequencing data is noise-free
ise level can be well controlled. However, it has been recognised
sposon insertions are random events [Golden, et al., 2000;
h, et al., 2014; Baym, et al., 2016]. In other words, in replicated
on sequencing data, it is very unlikely that there will be identical
of transposon insertion distributions across replicates. An
site in one replicate may not be present in other replicates in the
periment. Even when it is presented across replicates, it is less
it to be inserted at the exact identical base pair across replicates.
een assumed that a well-designed study will ensure that all
on insertion sites in a target genome are covered in replicated data.
words, it assumes that the replicate number is the minimum set to
transposon insertion sites. In addition, the insertion frequency is
event and will differ between replicates at each insertion site. It
re a reasonable assumption that unobserved but true transposon
frequency can be discovered if replicated data are used. Based on
mption, a novel and more efficient approach may need to be
ed to deal with sequencing data noise for a better essential gene
tion. This kind of thinking can also be used for other areas
with the sequencing data such as sequencing assembling, where it
been exercised for replicate-free data. In fact, the de novo